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Abstract 

Implicit particle filtering is a sequential Monte Carlo method for data assim- 
ilation, designed to keep the number of particles manageable by focussing 
attention on regions of large probability. These regions are found by min- 
imizing, for each particle, a scalar fimction F of the state variables. Some 
previous implementations of the implicit filter rely on finding the Hessians 
of these functions. The calculation of the Hessians can be cumbersome if the 
state dimension is large or if the underlying physics are such that derivatives 
of F arc difficult to calculate. This is the case in many geophysical applica- 
tions, in particular for models with partial noise, i.e. with a singular state 
covariance matrix. Examples of models with partial noise include stochastic 
partial differential equations driven by spatially smooth noise processes and 
models for which uncertain dynamic equations are supplemented by con- 
servation laws with zero uncertainty. We make the implicit particle filter 
applicable to such situations by combining gradient descent minimization 
with random maps and show that the filter is efficient, accurate and reliable 
because it operates in a subspace whose dimension is smaller than the state 
dimension. As an example, we assimilate data for a system of nonlinear 
partial differential equations that appears in models of geomagnetism. 
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1 Introduction 



The task in data assimilation is to use available data to update the fore- 
cast of a numerical model. The numerical model is typically given by a 
discretization of a stochastic differential equation (SDE) 

= i?(x",t") + G(x",t")AW^"+\ (1) 

where x is an m-dimensional vector, called the state, t", n = 0, 1, 2, . . . , is 
a sequence of times, R is an m-dimensional vector function, G is an m x m 
matrix and AW is an m-dimensional vector, whose elements are independent 
standard normal variates. The random vectors G{x^ ,t^)AW^~^^ represent 
the uncertainty in the system, however even for G = the state may be 
random for any n because the initial state x^ can be random. The data 

are collected at times t'^^''\ I = 1, 2, ... ; for simplicity, we assume that the 
data are collected at a subset of the model steps, i.e. q{l) = rl, with r > 1 
being a constant. In the above equation, z is a fc-dimensional vector {k < m), 
/i is a fe-dimensional vector function, V is a /c-dimensional vector whose 
components are independent standard normal variates, and Q is a k x k 
matrix. Throughout this paper, we will write x^'" for the sequence of vectors 

Data assimilation is necessary in many areas of science and engineering 
and is essential in geophysics, for example in oceanography, meteorology, 
geomagnetism or atmospheric chemistry (see e.g. the reviews (5| [l8|[20|[27[ 



28,39]). What makes the assimilation of data in geophysical applications 



difficult is the complicated underlying physics, which lead to a large state 
dimension m and a nonlinear function R in equation Q. 

If the model ([T]) as well as /i in ([2]) are linear and if, in addition, the 
initial state x^ is Gaussian, then the probability density function (pdf) of the 
state x" is Gaussian for any n and can be characterized in full by its mean 
and covariance. The Kalman filter (KF) sequentially computes the mean 
of the model ([T]), conditioned on the observations and, thus, provides the 



best linear unbiased estimate of the state 23 . The ensemble Kalman filter 



(EnKF) is a Monte Carlo approximation of the Kalman filter and can be 
obtained by replacing the state covariance matrix by the sample covariance 
matrix in the Kalman formalism. The state covariance is the covariance 
matrix of the pdf of the current state conditioned on the previous state 
which we calculate from the model Q to be: 

p(x"+^ I x") ~ AA(i?(x", t"), G(x", r)G(x", ef), (3) 
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where A/'(/i, S) denotes a Gaussian with mean /i and covariance matrix S. 
To streamhne the notation we write for the state covariance 



= G(x",t")G(x",t"f , (4) 

where T denotes a transpose. In the EnKF, the sample covariance matrix is 
computed from an "ensemble," by running the model ([T]) for different realiza- 
tions of the noise process AW. The Monte Carlo approach avoids the com- 
putationally expensive step of updating the state covariance in the Kalman 
formalism. Both KF and EnKF have extensions to nonlinear, non-Gaussian 
models, however they rely on linearity and Gaussianity approximations [22| . 

Variational methods [3|[Tl|[l2}|36}{38||42] aim at assimilating the obser- 
vations within a given time window by computing the state trajectory of 
maximum probability. The trajectory is computed by minimizing a suit- 
able cost function which is, up to a normalization constant, the logarithm 
of the pdf of the state trajectory = x'^''^*^'^ given the set of observations z^, 
p^^O:q{i) I In particular, 3D-Var methods assume a static model 



36 



Strong constraint 4D-Var determines an optimal initial state given a "per- 
fect" dynamic model, i.e. G = 0, and a Gaussian initial uncertainty, i.e. 
~ 7V(^°,i;0) [ii|[i2||36||37|. Uncertain models with G / are tackled 



with a weak constraint 4D-Var approach (3 |38p2 . Many implementations of 



variational methods compute the gradient of the cost function from tangent 
linear adjoint equations and rely on linear approximations. 

For the reminder of this paper, we focus on sequential Monte Carlo 
(SMC) methods for data assimilation, called particle filters [H[7|[8| [l4|[l5| 
19 ,|29-31,40 ,41 . Particle filters do not rely upon linearity or Gaussianity 



assumptions and approximate the pdf of the state given the observations, 
p(^rj.O:q{i) I 2^-'), by SMC. The state estimate is a statistic (e.g. the mean, 
median, mode etc.) of this pdf. Most particle filters rely on the recursive 
relation 

(5) 

In the above equation p(^x^'-i^^+'^) | z^-'+i) is the pdf of the state trajectory 
up to time given all available observations up to time 

tgC+i) and IS 

called the target density; p{z^~^'^ \ a;'?^'"'"-^)) is the probability density of the 
current observation given the current state and can be obtained from ([2]): 

p(z'+^ I rE^('+i)) ~ AA(/i(x5«,t^«),S^), (6) 

with 

S- = Q(x",r)Q(x",t")^. (7) 
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The pdf p(^x'^W+^-i^'-+^) I x*^^^^) is the density of the state trajectory from the 
previous assimilation step to the current observation, conditioned on the 
state at the previous assimilation step, and is determined by the model ([T]). 

A standard version of the sampling-importance-resampling (SIR) parti- 
cle filter (also called bootstrap filter, see e.g. [l4|) generates, at each step, 
samples fromp(x''''^'*"'"^''^^'"'"^^ | x*^*^'^) (the prior density) by running the model. 
These samples (particles) are weighted by the observations with weights 
w oc p{z^~^^ I x''^'"^-'^)), to yield a posterior density that approximates the 
target density p{x^''^^''^ \ z^''). One then removes particles with a small 
weight by "resampling" (see e.g. [I] for resampling algorithms) and repeats 
the procedure when the next observation becomes available. This SIR fil- 
ter is straightforward to implement, the catch is that many particles have 
small weights because the particles are generated without using information 
from the data. If many particles have a small weight, the approximation of 
the target density is poor and the number of particles required for a good 
approximation of the target density can grow catastrophically with the di- 
mension of the state [4 ,34 . Various methods, e.g. different prior densities 
and weighting schemes, have been invented to ameliorate this problem (see 
e.g. (I4|[39}|4l]). 

The basic idea of implicit particle filters Q|8 31 is to use the available 
observations to find regions of high probability in the target density and look 
for samples within this region. This implicit sampling strategy generates a 
thin particle beam within the high probability domain and, thus, keeps the 
number of particles required manageable, even if the state dimension is large. 
The focussing of particles is achieved by setting up an underdetermined 
algebraic equation that depends on the model ([T]) as well as on the data 
([2]), and whose solution generates a high probability sample of the target 
density. We review the implicit filter in the next section, and it will become 
evident that the construction assumes that the state covariance S" in Q is 
nonsingular. This condition is often not satisfied. If, for example, one wants 
to assimilate data into a stochastic partial differential equation (SPDE) 
driven by spatially smooth noise, then the continuous-time noise process 
can be represented by a series with rapidly decaying coefficients, leading 
to a non-singular or ill-conditioned state covariance in discrete time and 
space (see Sections 3.1 and[4| as well as [lOpTpG] ). A second important class 
of models with partial noise are uncertain dynamic equations supplemented 
by conservation laws (e.g. conservation of mass) with zero uncertainty. Such 
models often appear in data assimilation for fluid dynamics problems [25] . 

The purpose of the present paper is two-fold. First, in Section [2j we 
present a new implementation of the implicit particle filter. Most previous 
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implementations of the implicit filter [7 ,31 rely in one way or another on 
finding the Hessians of scalar functions in rm variables. For systems with 
very large state vectors and considerable gaps between observations, mem- 
ory constraints may forbid a computation of these Hessians. Our new imple- 



mentation combines gradient descent minimization with random maps 31 



to avoid the calculation of Hessians, and thus reduces the memory require- 
ments. 

The second objective is to consider models with a singular or ill-conditioned 
state covariance where previous implementations of the implicit filter, as 
described in [7||8 31 , are not applicable. In Section [sj we make the implicit 
filter applicable to models with partial noise and show that our approach 
is then particularly efficient, because the filter operates in a space whose 
dimension is determined by the rank of S", rather than by the model di- 
mension. We compare the new implicit filter to SIR, EnKF and variational 
methods, in particular with respect to how information is propagated from 
observed variables to unobserved ones. 

In Section |4j we illustrate the theory with an application in geomag- 
netism and consider two coupled nonlinear SPDE's with partial noise. We 
observe that the implicit filter gives good results with very few (4-10) par- 
ticles, while EnKF and SIR require hundreds to thousands of particles for 
similar accuracy. 



2 Implicit sampling with random maps 

We first follow [3l] closely to review implicit sampling with random maps. 
Suppose we are given a collection of M particles xf^\ j = 1,2,..., M, 

whose empirical distribution approximates the target density at time 
where q{l) = rl, and suppose that an observation is available after r 
steps at time 

fq{l+i) ^ f'Ci+i). From O we find, by repeatedly using Bayes' 
theorem, that, for each particle, 

X p{xf+'\xf). (8) 

Implicit sampling is a recipe for computing high-probability samples from 
the above pdf. To draw a sample we define, for each particle, a function Fj 
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by 



exp(-F(X,)) = p(xf I Xf^'^~') • --pixf^' I Xf] 

X p(z'+Vf^'^ (9) 

where is shorthand for the state trajectory Xj^''^^^''^^''^^\ Specificahy, 
we have 

F^ix,) = [xf^' - Rff {j:^^y\xf^' - Rf 



(10) 

where i?" is shorthand notation for R{X'^,t"') and where Zj is a positive 
number that can be computed from the normahzation constants of the var- 
ious pdf's in the definition of Fj in ([o]). With this Fj, we solve the algebraic 
equation 

F(X,)-<^, = ieJe,, (11) 

where is a realization of the rm— dimensional reference variable ^ ~ 
AA(0, /), and where 

(t)j = m.\D.Fj. (12) 

The choice of a Gaussian reference variables does not imply linearity or 
Gaussianity assumptions and other choices are certainly possible. We find 
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solutions of (11) by using the random map 

Xj = fij + XjLjTij, (13) 

where Xj is a scalar, fj,j is an rm-dimensional column vector which represents 
the location of the minimum of Fj, i.e. /ij = argminFj, Lj is a deterministic 



rm X rm matrix we can choose, and rjj = ^jj < / M ^j, is uniformly distributed 



on the unit rm-sphere. Upon substitution of (13) into (11), we can find a 
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solution of (11) by solving a single algebraic equation in the variable Xj . 
The weight of the particle can be shown to be (see 31 , Section 3) 



rm-l (^^i 



^ dp, 



(14) 



where pj = ^JS^j and det Lj denotes the determinant of the matrix Lj (see 
|3l| for details of the calculation). An expression for the scalar derivative 



dXj/dpj can be obtained by implicit differentiation of (11): 



dp, 2(VF,)Ljr/,' 



(15) 



where VFj denotes the gradient of Fj (an rm-dimensional row vector). 

The weights are normalized so that their sum equals one. The weighted 
positions Xj of the particles approximate the target pdf. We compute the 
mean of Xj with weights Wj as the state estimate, and then proceed to assim- 
ilate the next observation. The method just described makes use of only one 
set of observations per assimilation step, however an extension to multiple 
observation sets per assimilation step (smoothing) is straightforward. 

2.1 Implementation of an implicit particle filter with gradi- 
ent descent minimization and random maps 

An algorithm for data assimilation with implicit sampling and random maps 



was presented in 31 . This algorithm relies on the calculation of the Hessians 
of the -Fj's and the Hessians are used for minimizing the Fj^s with Newton's 
method and for setting up the random map. The calculation of the Hessians 
however may not be easy in some applications because of a very large state 
dimension, or because the second derivatives are hard to calculate, as is the 
case for models with partial noise (see Section [s]). To avoid the calculation of 
Hessians, we propose to use a gradient descent algorithm with line-search to 
minimize the Fj's (see e.g. [32]), along with simple random maps. Of course 
other minimization techniques, in particular quasi-Newton methods (see e.g. 



[16,32 , can also be applied here and perhaps speed up the minimization. 
However, we decided to use gradient descent with line search to keep the 
minimization as simple as possible. 

For simplicity, we assume that G and Q in are constant matrices 



and calculate the gradient of Fj from (10): 
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with 



dF ^ ^ 



( X'' - R^-^ 



,dR 



..x^f^-Ux'^'-R')^ (17) 



for k = q{l) + l,q{l) + 2, . . . ,g(/ + l)-l, where i?" is shorthand for i?(X",t"), 
and where 



fdh 
\dx 



(18) 



Here, we dropped the index j for the particles for notational convenience. 
We initiahze the minimization using the result of a simplified implicit parti- 
cle filter (see next subsection). Once the minimum is obtained, we substitute 



the random map (13) with Lj = /, where / is the identity matrix, into (11) 
and solve the resulting scalar equation by Newton's method. The scalar 
derivative we need for the Newton steps is computed numerically. We ini- 
tialize this iteration with Aj = 0. Finally, we compute the weights according 
to ([14). If some weights are small, as indicated by a small effective sample 
size [1 , 

M.// = 1/ (-r^T) (1^) 

we resample using Algorithm 2 in |1|. The implicit filtering algorithm with 
gradient descent minimization and random maps is summarized in pseudo- 
code in Algorithm [TJ 

This implicit filtering algorithm shares with weak constraint 4D-Var that 
a "cost function" (here Fj) is minimized by gradient descent. However, most 
4D-Var implementations use tangent linear adjoint equations to compute 
the gradient. In the implicit filtering Algorithm [T| we do a fully nonlinear 
calculation of the gradient. Two further differences between 4D-Var and 
Algorithm [T] are (i) 4D-Var does not update the state sequentially, but the 
implicit particle filter does and, thus, reduces memory requirements; {ii) 
4D-Var computes the most likely state by minimizing the cost function, and 
this estimate can be biased; the implicit particle filter approximates the 



8 



Algorithm 1 Implicit Particle Filter with Random Maps and Gradient 
Descent Minimization 

{Initialization, t = 0} 
for J = 1, ... ,M do 

• sample Xj ~ po(^) 
end for 



{Assimilate observation z } 
for j = 1, . . . ,M do 

• Set up and minimize Fj using gradient descent to compute (pj and fij 



• Sample reference density ~ ^"(0, /) 

• Compute pj = ^J^j and rjj = / ^J~pj 

• Solve (11) using the random map (13) with Lj 

• Compute weight of the particle using (14) 

• Save particle Xj and weight Wj 
end for 



• Normalize the weights so that their sum equals 1 

• Compute state estimate from Xj weighted with Wj (e.g. the mean) 

• Resample if M^jf < c 

• Assimilate z^~^^ 
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target density and, thus, can compute other statistics as state estimates, in 
particular the conditional expectation, which is, under wide conditions, the 
optimal state estimate (see e.g. |9j). 

2.2 A simplified implicit particle filtering algorithm with 
random maps and gradient descent minimization 

We wish to simplify the implicit particle filtering algorithm by reducing 
the dimension of the function Fj. The idea is to do an implicit sampling 
step only at times t'^^^~^^\ i.e. when an observation becomes available. The 
state trajectory of each particle from time t"^^'^ (the last time an observa- 
tion became available) to is generated using the model equations 
([T]). This approach reduces the dimension of Fj from rm to m (the state 
dimension) . The simplification is thus very attractive if the number of steps 
between observations, r, is large. However, difficulties can also be expected 
for large r: the state trajectories up to time are generated by the 
model alone and, thus, may not have a high probability with respect to the 
observations at time t''^''^^\ The focussing effect of implicit sampling can 
be expected to be less emphasized and the number of particles required may 
grow as the gap between observations becomes larger. Whether or not the 
simplification we describe here can reduce the computational cost is prob- 
lem dependent and we will illustrate advantages and disadvantages in the 
examples in Section |4j 

Suppose we are given a collection of M particles xf\ j = 1, 2, . . . , M, 

whose empirical distribution approximates the target density at time t"^^'^ 
and the next observation, z'"*"^, is available after r steps at time t'^^''^^\ For 
each particle, we run the model for r— 1 steps to obtain Xj^^^^^, . . . , Xj^^^^^ ^. 



We then define, for each particle, a function Fj by 



+ Zj (20) 



whose gradient is given by (18). The algorithm then proceeds as Algorithm 
1 in the previous section: we find the minimum of Fj using gradient descent 
and solve (11) with the random map (13) with Lj = I. The weights are 



calculated by (14) with r = 1 and the mean of Xj weighted by Wj is the 
state estimate at time t'^^^^'^\ 
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This simplified implicit filter simplifies further if the observation function 



is linear, i.e. h{x) = Hx, where H is a k x m matrix. One can show 31 



that the minimim of Fj is 

cP, = - HRf+'^-YKr\z'+' - HRf^'^-'), (21) 

with 

K,=HT.lf^'^-^H'^ + T}+J. (22) 

A numerical approximation of the minimum is thus not required. The loca- 
tion of the minimum is 



with 



s7^ = fe(;+^)-rVi?^(4?)-^^. (24) 



Moreover, if Lj is a Cholesky factor of Sj, then Xj = fij + Lj^j solves (11) 
and the weights simplify to 

^7+^ oc w] exp{-(j)j) |det Lj \ . (25) 

For the special case of a linear observation function and observations avail- 
able at every model step (r = 1), the simplified implicit filter is the full 
implicit filter and reduces to a version of optimal importance sampling 
fTl[5l[7l|3Tl. 



3 Implicit particle filtering for equations with par- 
tial noise 

We consider the case of a singular state covariance matrix Ylx in the context 
of implicit particle filtering. We start with an example taken from ^21j, to 
demonstrate how a singular state covariance appears naturally in the con- 
text of SPDE's driven by spatially smooth noise. The example serves as a 
motivation for more general developments in later sections. Another class 
of models with partial noise consists of dynamical equations supplemented 
by conservation laws. The dynamics are often uncertain and thus driven by 
noise processes, however there is typically zero uncertainty in the conserva- 
tion laws (e.g. conservation of mass), so that the full model (dynamics and 
conservation laws) is subject to partial noise |25|. 
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3.1 Example of a model with partial noise: the semi-linear 
heat equation driven by spatially smooth noise 



We consider the stochastic semi-hnear heat equation on the one-dimensional 
domain x G [0, 1] over the time interval t £ [0, 1] 

du d'^u „, , dWt 
di 



+ T{u) + 



(26) 



where F is a continuous function, and Wt is a cylindrical Brownian motion 



(BM) [21]. The derivative dWt/dt in (26) is formal only (it does not exist in 



the usual sense). Equation (26) is supplemented by homogeneous Dirichlet 
boundary conditions and the initial value u{x,0) = Uo{x). We expand the 
cylindrical BM Wt in the eigenfunctions of the Laplace operator 



(27) 



it=i 



where (3^ denote independent BM's and where the coefficients qk > must 
be chosen such that, for 7 € (0, 1), 



k=l 



27- 
k 



Qk < 00, 



(28) 



where are the eigenvalues of the Laplace operator 21 . If the coefficients 



Qk decay fast enough, then, by (27) and basic properties of Fourier series. 



the noise is smooth in space and, in addition, the sum ( 28 ) remains finite as 



is required. For example one may be interested in problems where 

e"^'^, if A; < c, 
0, if A; > c. 



Qk 



(29) 



for some c > 0. 

The continuous equation must be discretized for computations and here 
we consider the Galerkin projection of the SPDE into an m-dimensional 
space spanned by the first m eigenfunctions of the Laplace operator 



dur 



{Amur+r^iun)dt+dwi' 



(30) 



where U[^, F^ 



and Wf^ are m-dimensional truncations of the solution, the 



function F and the cylindrical BM Wt respectively, and where Am is a dis- 



cretization of the Laplace operator. Specifically, from (27) and (29), we 
obtain: 



dWt"" = V2e-^ sm{kTTx)d(3^ . 



(31) 



k=l 
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After multiplying with the basis functions and integrating over the spatial 
domain, we are left with a set of m stochastic ordinary differential equations 



dx = f{x)dt + gdW (32) 
where x is an m-dimensional state vector, / is a nonlinear vector function, 



is a BM. In particular, we calculate from (31): 

5 = -^diag((e^\e-2,...,e-^0,0,...,0)) , c < m, (33) 

where diag(o) is a diagonal matrix whose diagonal elements are the com- 
ponents of the vector a. Upon time discretization using, for example, a 
stochastic version of forward Euler with time step 6 |24|, we arrive at ([T]) 
with 

i?(x) =x" + 5/(x"), G{x) = V~5g. (34) 

It is now clear that the state covariance matrix = GG^ is singular for 
c < m. 

A singular state covariance causes no problems for running the discrete 
time model ([T]) forward in time. However problems do arise if we want to 
know the pdf of the current state given the previous one. For example, the 
functions Fj in the implicit particle filter algorithms (either those in Section 
[2| or those in ^^^^) are not defined for singular S^,-. If c > m, then Tix is ill- 



conditioned and causes a number of numerical issues in the implementation 
of these implicit particle filtering algorithms and, ultimately, the algorithms 
fail. 

3.2 Implicit particle filtering of models with partial noise, 
supplemented by densely available data 

We start deriving the implicit filter for models with partial noise by con- 
sidering the special case in which observations are available at every model 
step (r = 1). For simplicity, we assume that the noise is additive, i.e. 
G{x^,t^) = G = constant and that Q in ([2]) is a constant matrix. Un- 
der these assumptions, we can use a linear coordinate transformation to 
diagonalize the state covariance matrix and rewrite the model ([T]) and the 
observations ^ as 

x^+i = /(x",2/",t") + A^"+\ AVF"+i ~ AA(0, S^) (35) 
= 5(x^y^^"), (36) 
z'^+i = /i(a;",y") + gy", (37) 
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where x is a p-dimensional column vector, p < m is the rank of the state 
covariance matrix Q, and where / is a p-dimensional vector function, T,x 
is a non-singular, diagonal p x p matrix, y is a (m — p)-dimensional vector, 
and g is a (m — p)-dimensional vector function. For ease of notation, we 



drop the hat above the "new" state covariance matrix T,x in (35) and, for 
convenience, we refer to the set of variables x and y as the "forced" and 
"unforced variables" respectively. 

The key to filtering this system is observing that the unforced variables 
at time t""*"^, given the state at time t", are not random. To be sure, y"" is 
random for any n due to the nonlinear coupling g(x, y), but the conditional 
pdf p[y'^~^^ I is the delta-distribution. For a given (not random) 

initial state x^ , y^, the target density is 

X p(z"+i I I x", y"). (38) 

Suppose we are given a collection of M particles, X", Y^, j = 1, 2, . . . , M, 
whose empirical distribution approximates the target density p{x^''^ , y^-^ \ 
z^'"") at time t". The pdf for each particle at time i""*"^ is thus given by 



(38) with the substitution of Xj for x and Yj for y. In agreement with the 
definition of Fj in previous implementations of the implicit filter, we define 
Fj here by 

exp(-F,-(x;+i)) = p(z"+i I | x;, y/). (39) 

More specifically, 

+ Zj (40) 

where /" is shorthand notation for /(X", 1^", t"). With this Fj, we can use 
Algorithm [T] to construct the implicit filter. For this algorithm we need the 
gradient of Fj : 
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Note that Y^'^^ is fixed for each particle, if its previous state, (X",!^") 



n+l 



is known, so that the filter only updates when the observations 

become available. The unforced variables of the particles, Yp~^^, are moved 
forward in time using the model, as they should be, since there is no uncer- 
tainty in y'^+i given x",?/". The data are used in the state estimation of y 
indirectly through the weights and through the nonlinear coupling between 
the forced and unforced variables of the model. If one observes only the 
unforced variables, i.e. h{x,y) = h{y), then the data is not used directly 



when generating the forced variables, , because the second term in (40) 



is merely a constant. In this case, the implicit filter becomes equivalent to 
a standard SIR filter, with weights wj~^^ = exp{—(l)j). 

This implementation of the implicit filter is numerically effective for fil- 
tering systems with partial noise, because the filter operates in a space of 
dimension p (the rank of the state covariance matrix), which typically is 
less than the state dimension (see the example in Section |4]). The use of a 
gradient descent algorithm and random maps further makes the often costly 
computation of the Hessian of Fj unnecessary. If h is linear no iterative 
minimization is required. 

If the state covariance matrix is ill-conditioned, a direct implementa- 
tion of Algorithm [T] is not possible. We propose to diagonalize the state 
covariance and set all eigenvalues below a certain threshold to zero so that 
a model of the form ([35|)-(37) can be obtained. In our experience, such 



approximations are accurate and the filter of this section can be used. 

3.3 Implicit particle filtering for models with partial noise, 
supplemented by sparsely available data 



We extend the results of Section [3.2| to the more general case of observations 
that are sparse in time. Again, the key is to realize that y""*"^ is fixed given 
x^,y^. For simplicity, we assume additive noise and a constant Q in ([2]), so 
that the target density becomes 

X p(x''('+i) I x«('+i)-\y«('+i)-i) 
X p(x'?('+i)-i I a;5a+i)-2,y9('+i)-2) 

X p(x'?(')+^ I 
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Given a collection of M particles, X", 1^", j = 1, 2, . . . , M, whose empirical 
distribution approximates the target density p{x ''^ >, at time 

, we define, for each particle, the function Fj by 



X p{xf+'\xf\Yf^) (42) 
where is shorthand for Xj^^^^^'"'''^^''~^^\ so that 

F,iX,) = I [Xf^' - fff^-' [Xf^' - ff 



+ l{h[xf^'\Yf^''>)-z^^'Y^-' 

h(^xf^'\Yf+'^)-z'+')+Zj. (43) 

At each model step, the unforced variables of each particle depend on the 
forced and unforced variables of the particle at the previous time step, so 
that Yf'^'^ is a function of xf\xf^\...,xf^'^-' and ff^'^ is a 

function of x^^'-*^"^, X,^^^''^^, . . . ,X'il^^^^\ The function F,- thus depends on 
the forced variables only. However, the appearances of the unforced variables 
in Fj make it rather difficult to compute derivatives. The implicit filter with 
gradient descent minimization and random maps (see Algorithm [T]) is thus 
a good filter for this problem, because it only requires computation of the 
first derivatives of Fj , while other implementations (see 7|31] ) require second 
derivatives as well. 

The gradient of Fj is given by the rp-dimensional row vector 



g{l+l)-l 



. dFj dFj dFj . 
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with 



\rlt, fK, 




( X 



T 



+ -TT- fc+1 



k+l 



dy dX'' 




(df dj/^^Y y-i ( 



'J 



for A; = + . . . , q{l + l) — l and where (•) \k denotes "evaluate at time t'^." 
The derivatives dy'^/dXj, i = k + 1, . . . ,q{l), can be computed recursively 
while constructing the sum, starting with 

and then using 



Qyk + l Qg Qyl 1 

1)X[ ~ dx ''"^ dX^ 



i = k + 2,...,q{l). (47) 



The minimization of Fj for each particle is initialized with a free model 
run for r steps, with initial conditions given by the final position of the j^^ 
particle at the previous assimilation step. With this initial guess we compute 



the gradient using (44)- (47) and, after a line search and one step of gradient 
descent, obtain a new set of forced variables. We use this result to update 
the unforced variables by the model, and proceed to the next iteration. 
Once the minimum cpj and its location fij are found, we use the random 



map (|13|) with Lj = I to compute Xj^'^^^, . . . for this particle and 



then use these forced variables to compute Yj^^^~^^'"'''^^''~^^\ We do this for all 
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particles, and compute the weights from ( 14 ) with m = p, then normahze the 
weights so that their sum equals one and thereby obtain an approximation 
of the target density. We resample if the effective sample size Mejj is below 
a threshold and move on to assimilate the next observation. The implicit 
filtering algorithm is summarized with pseudo code in Algorithm [2] 

Algorithm 2 Implicit Particle Filter with Random Maps and Gradient 
Descent Minimization for Models with Partial Noise 

{Initialization, t = 0} 
for j = 1, . . . ,M do 

• sample ^ po{X) 
end for 



{Assimilate observation z'} 
for j = 1, . . . ,M do 

• Set up and minimize Fj using gradient descent: 
Initialize minimization with a free model run 
while Convergence criteria not satisfied do 

Compute gradient by (44) 
Do a line search 

Compute next iterate by gradient descent step 
Use results to update unforced variables using the model 
Check if convergence criteria are satisfied 
end while 

• Sample reference density ~ A/'(0, /) 

• Compute pj = ijij and rjj = ij/J~Pj 



• Solve (11) using random map (13) with Lj = I to compute Xj 

• Use this Xj and the model to compute corresponding Yj 

• Compute weight of the particle using (14) 

• Save particle {Xj,Yj) and weight Wj 
end for 



• Normalize the weights so that their sum equals 1 

• Compute state estimate from Xj weighted with Wj (e.g. the mean) 

• Resample if MEjf < c 

• Assimilate z^~^^ 



Note that all state variables are computed by using both the data and the 
model, regardless of which set of variables (the forced or unforced ones) is ob- 
served. The reason is that sparse observations induce a nonlinear coupling. 
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through / and g in (35)-(37), between the unforced and forced variables at 
the various model steps. It should also be noted that the function Fj is a 
function of rp variables (rather than rm), because the filter operates in the 
subspace of the forced variables. If the minimization is computationally too 
expensive, because p or r is extremely large, then one can easily adapt the 
"simplified" implicit particle filter of Section 2.2 to the situation of partial 
noise using the methods we have described above. If h is nonlinear, this 
simplified filter requires a minimization of a p-dimensional function for each 
particle. If h is linear, no numerical minimization is required. 



3.4 Discussion 

We wish to point out similarities and differences between the implicit filter 
and three other data assimilation methods. In particular, we discuss how 
data are used in the generation of the state estimates. It is clear that the 
implicit filter uses the available data as well as the model to generate the 
state trajectories for each particle, i.e. it makes use of the nonlinear coupling 
between forced and unforced variables. 

SIR and EnKF make less direct use of the data. In SIR, the particle 
trajectories are generated using the model alone and only later weighted by 
the observations. Data thus propagate to the SIR state estimates indirectly 
through the weights. In EnKF, the state trajectories are generated using the 
model and only the states at times t'^^'^ (when data are available) are updated 
by data. Thus, EnKF uses the data only to update its state estimates at 
times for which data are actually available. 

A weak constraint 4D-Var method is perhaps closest in spirit to the 
implicit filter. In weak constraint 4D-Var, a cost function similar to Fj is 
minimized (typically by gradient descent) to find the state trajectory with 
maximum probability given data and model. If one picks the time window 
for a 4D-Var assimilation from one observation to the next z'"*"^, then 
the use of the data is similar to the use of the data in an implicit filter, 
because, in both algorithms, the model as well as data are used to generate 
the state trajectories. In fact, one can view the implicit particle filter as 
a randomized and sequential version of weak constraint 4D-Var (or, one 
may interpret weak constraint 4D-Var as an implicit smoother with a single 
particle). These issues will be taken up in more detail in a subsequent 
paper [2]. 

Finally, we would like to discuss the implicit filtering algorithm for the 
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special case of a perfect model, i.e. 



= 9{y\n, (48) 
Ky'i^^^) + Q^VK (49) 



Following the steps above and, assuming we are given a collection of M 
particles, YJ^, j = 1,2, . . . , M , whose empirical distribution approximates 
the target density p(yO-'j(0 | z^'^) at time we define, for each particle, 
the function Fj by 

eM-Fj)=p{z'^'\Yf^'^), (50) 

so that 

+ Zj (51) 

Since Yf^^^ is fixed for each particle, Fj is merely a constant that is used to 
weigh the particle trajectory by the weight 

t^J.+i = ^.exp(-Fj). (52) 

The data are used indirectly here, because the initial condition determines 
the full state trajectory. However, this initial condition is fixed for each 
particle. For a perfect model, strong constraint 4D-Var makes more efficient 
use of the available data by using it to find an "optimal initial condition," 
compatible with the data. 



4 Application to Geomagnetism 

Data assimilation has been recently applied to geomagnetic applications and 
there is a need to find out which data assimilation technique is most suitable 
[Ts) . Thus far, a strong constraint 4D-Var approach ^17J and a Kalman filter 
approach [35] have been considered. Here, we apply the implicit particle 
filter to a test problem very similar to the one first introduced by Fournier 
and his colleagues in [T7|. The model is given by two SPDE's 

dtu + udxU = bdxb + ud^u + gudtW{x,t), (53) 
dtb + ud^b = bdxU + dlb + gbdtW{x,t), (54) 
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where, gu,gb are scalars, and where is a stochastic process (the derivative 
here is formal and may not exist in the usual sense). We study the above 
equations with ly = 10~^ as in [it], and with gu = 0.01, gb = 1, so that the 
uncertainty in the unobserved quantity is much larger than the uncertainty 
in the unobserved quantity. We consider the above equations on the strip 
< t < T, — l<x<l and with boundary conditions 

u{x,t) = 0, if X = ±1, ti(x, 0) = sin(7rx) + 2/5 sin(57rx), (55) 
b{x,t) = ±l, ifx = ±l, 6(x,0) = cos(7r2;) + 2sin(7r(x + l)/4). (56) 

Physically, u represents the velocity field and b represents the secular vari- 
ation of the magnetic field. The model is essentially the model proposed 
in [Tt], but with additive noise 

oo 

W{x, t) = ^ afc sm{kTTx)wl{t) + (3k cos{kTT /2x)wl{t). (57) 

fc=0 

where w\,w\ are independent BMs and where 

/ 1, iffc<10, 
"'= = ^'= = 10, ifA:>10. 

Here, we are content with this simple noise model that represents a small 
uncertainty at the boundaries of both fields and is spatially smooth. How- 
ever, it is straightforward to incorporate more information about the spatial 
distribution of the uncertainty by picking suitable coefficients a^, (3k. An 
illustration of the noise process is given in Figure [T] 

4.1 Discretization of the dynamical equations 



We follow [17] in the discretization of the dynamical equations, however we 
decided to present some details of the discretization to explain how the noise 
process W comes into play. 

For both fields, we use Legendre spectral elements of order N (see e.g. 
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Figure 1: The noise process W{x,t). Left: Noise plotted as a function of x 
and t. Right: Snapshot oiW{x,t = t). 



(6|[T3]), so that 
u(x,t) = 

b{x,t) = 

W{x,t) = 



where ipj are the characteristic Lagrange polynomials of order A^, centered 
at the jth Gauss-Lobatto-Legendre (GLL) node We substitute the ex- 
pansions into the weak form of (53) and (54) (no integration by parts) and 
evaluate the integrals by Gauss-Lobatto-Legendre quadrature 

,1 TV 



N 

^Uj{t)lpj{x) = 


N-l 


j=Q 


i=i 


N 


N-l 


Y,h,{t)^j{x) = 


-il^i{x) + i)N{x) + ^ hj{t)i}j{x), 




i=i 


N 


N-l 


Y,Wjm,{x) = 

j=0 


-- W,{t)^l:,{x) 
i=i 



p{x)dx ~ 



where Wj are the corresponding weights. Making use of the orthogonality of 
the basis functions, ipj{^,k) = ^j,k^ we obtain the set of SDE's 



MdtiL 



M (boDb-uoDu + uD'^u + ^^h + QudtW ) , 



Mdtb = M (boDu-uoDb + D^-^^u + ^^^+gbdtW) , 
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where o denotes the Hadamard product ((u o b)k = Ukbk), u, b, W are m = 
{N — 2)-dimensional column vectors whose components are the coefficients 
in the series expansions of u, b, Wu and Wf, respectively and where = 
diag{{dxil^j{^i),.. . ,dx'il;j{CN-i))) and = {dxxif^2{Ci), ■ ■ • , S^xV'Af-il^Ar-i))^ 
is a diagonal mxm matrix and an m-dimensional column vector respectively, 
which make sure that our approximation satisfies the boundary conditions. 
In the above equations, the mxm matrices M, D and are given by 



M = dmg{{wi,...,WN-i)), Dj^k = dxipji^k), ^j,fc = <9xxV'i(^fc)• 
We apply a first-order implicit-explicit method with time step 5 for time 
discretization and obtain the discrete-time and discrete-space equations 

{M -5vMD'^)u''+^ = M(n'' + (5(6"oD5"-'u"oL>n'' + ^'f6")) +AW^, 



where 
and 



APF„ ~ AA(0, S„) , ^Wb ~ AA(0, Sfe) (59) 

= gl5M{FsCC^Fj + FcCC^F^)M^, (60) 

Efc = gl5M{FsCC^Fj + F,CC''F^)M^, (61) 

C = diag((ai, . . . ,a„)), (62) 

Fs = (sin(7r),sin(27r),...,sin(m7r))(^i,^2,---,Cm)'^, (63) 

F, = (cos(V2),cos(37r/2),...,cos(m7r/2))(ei,6,---,U)^- (64) 



For our choice of ak, f^k in (58), the state covariance matrices and E;, 
are singular if m > 10. To diagonalize the state covariances we solve the 
symmetric eigenvalue problems 13^ 



(M - 5vMD^)vu = E„?;„A", 
(M - 6MD'^)vb = Efe^;fcA^ 

and define the linear coordinate transformations 

u = Vu{xu, Vu)'^ , h = Vh{xb,yhf, (65) 

where the columns of the m x m-matrices Vu and Vh are the eigenvectors 
of Vu, Vb respectively. The discretization using Legendre spectral elements 
works in our favor here, because the matrices M and are symmetric so 
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that we can diagonalize the left hand side simultaneously with the state 
covariance matrix to obtain 

y-a — 9u{x^,y^,Xij,yij ), 

= Mx:,y:,xly^) + AW,\ 
= g,ix:,y:,xlyl^), 

where /„, are 10-dimensional vector functions, gu,9b are (m— 10)-dimensional 
vector functions and where 

W: ~ AA(0,diag((A^A^,...,A5'o))), 
~ AA(o,diag((A?,A^,...,A?o))). 

We test the convergence of our approximation as follows. To assess 
the convergence in the number of grid-points in space, we define a refer- 
ence solution using N = 2000 grid-points and a time step of 5 = 0.002. 
We compute another approximation of the solution, using the same (dis- 
crete) BM as in the reference solution, but with another number of grid- 
points, say = 500. We compute the error at t = T = 0.2, = 
II (tt5oo(x,r)^, 6500(3;, T)^) - {uRef{x,T)'^,bRef{x,T)'^)\\, where || • || de- 
notes the Euclidean norm, and store it. We repeat this procedure 500 times 
and compute the mean of the error norms. The results are shown in the left 
panel of Figure [2j We observe a super algebraic convergence as expected 
from a spectral method. 

Similarly, we check the convergence of the approximation in the time 
step by computing a reference solution with Nji^f = 1000 and duef = 2^^^. 
Using the same BM as in the reference solution, we compute an approx- 
imation with time step 5 and compute the error at t = T = 0.2, et = 
II {us{x,T)'^ ,bsix,T)'^) - {uRef{x,T)'^ ,bRef{x,TY) II, and store it. We re- 
peat this procedure 500 times and then compute the mean of these error 
norms. The results are shown in the right panel of Figure [2j We observe a 
first order decay in the error as is expected. The scheme has converged for 
time steps smaller than 5 = 0.002, so that a higher resolution in time does 
not improve the accuracy of the approximation. Moreover, the Courant- 
Friederichs-Lewy condition limits the time step for a given number of nodes. 
The limit here is quite strict because the Legendre elements accumulate 
grid-points close to the boundaries so that the smallest spacing between 
grid-points is very small, even for a moderate number of nodes. 
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Figure 2: Convergence of discretization scheme for geomagnetic equations. 
Left: Convergence in the number of spatial grid-points. Right: Convergence 
in the time step. 



Here we are satisfied with an approximation with 5 = 0.002 and = 300 
grid-points in space as in |17J. The relatively small number of spatial grid- 
points is sufficient because the noise is very smooth in space and because 
the Legendre spectral elements accumulate nodes close to the boundaries 



and, thus, represent the steep boundary layer, characteristic of (53)-(54), 
well even if is small (see also [17]). 



4.2 Filtering results 

We apply the implicit particle filter with gradient descent minimization and 
random maps (see Algorithm [2] in Section [s]) , the simplified implicit parti- 
cle filter (see Section 2.2) adapted to models with partial noise, a standard 
EnKF (without localization or infiation), as well as a standard SIR filter 
to the test problem (53)-(54). The numerical model is given by the dis- 



cretization described in the previous section with a random initial state. 
The distribution of the initial state is Gaussian with mean u{x, 0), b{x, 0) as 
in (55)-(56) and with a covariance Ilu,'^b given by (60)-(61). In Figure [sj 
we illustrate the uncertainty in the initial state and plot 10 realizations of 
the initial state (grey lines) along with its mean (black lines). We observe 
that the uncertainty in uq is small compared to the uncertainty in bo. 

The data are the values of the magnetic field 6, measured at k equally 
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Figure 3: Uncertainty in the initial state. Left: u{x,0) (unobserved). Right: 
b{x,0) (observed). Black: mean. Grey: 10 realizations of the initial state. 



spaced locations in [0, 1] and corrupted by noise: 

z^=H¥^^^ + sV\ (66) 

where s = 0.001 and where H is a k x m-matrix that maps the numerical 
approximation b (defined at the GLL nodes) to the locations where data is 
collected. We consider data that are dense in time (r = 1) as well as sparse 
in time (r > 1). The data are sparse in space and we consider two cases: 
(?) we collect the magnetic field at 200 equally spaced locations; and (ii) we 
collect b at 20 equally spaced locations. The variables u are unobserved and 
it is of interest to study how the various data assimilation techniques make 



use of the information in b to update the unobserved variables u 17 , 18 . 

To assess the performance of the filters, we ran 100 twin experiments. A 
twin experiment amounts to: (i) drawing a sample from the initial state and 
running the model forward in time until T = 0.2 (one fifth of a magnetic 
diffusion time [It]) (ii) collecting the data from this free model run; and 
{Hi) using the data as the input to a filter and reconstructing the state 
trajectory. Figure |4] shows the result of one twin experiment for r = 4. 

For each twin experiment, we calculate and store the error at T = 0.2 in 
the velocity, 6^ = 11 u{x, T) — UFiiter{x-, 7") 1 1 / 1 1 u{x, T) 1 1, and in the magnetic 
field, efe = II b{x,T) — bpiiter{x,T) \ \ / \ \ b{x,T) \ \. After running the 100 twin 
experiments, we calculate the mean of the error norms (not the mean error) 
and the variance of the error norms (not the variance of the error). All 
filters we tested were "untuned," i.e. we have not adjusted or inserted any 
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Figure 4: Outcome of a twin experiment. Black: true state u{x, T) (left) 
and b{x,T) (right). Grey: reconstruction by implicit particle filter with 4 
particles. 



free parameters to boost the performance of the filters. 

Figure [5] shows the results for the implicit particle filter, the EnKF as 
well as the SIR filter for 200 measurement locations and for r = 10. It is 
evident from this figure that the implicit particle filter requires only very 
few particles to yield accurate state estimates with less than 1% error in 
the observed variables. The SIR filter with 1000 particles gives significantly 
larger errors (about 10% in the observed variables) and much larger variances 
in the errors. The EnKF requires about 500 particles to come close to the 
accuracy of the implicit filter with only 10 particles. 

In the experiments, we observed that the minimization in implicit par- 
ticle filtering typically converged after 4-10 steps (depending on r, the gap 
in time between observations). The convergence criterion was to stop the 
iteration when the change in Fj was less than 10%. A more accurate mini- 
mization did not improve the results significantly, so that we were satisfied 
with a relatively crude estimate of the minimum in exchange for a speed-up 



of the algorithm. We found A by solving (11) with Newton's method using 
a" = as initial guess and observed that it converged after about eight steps. 
The convergence criterion was to stop the iteration if \F{X) — cj) — p\ < 10~^, 
because the accurate solution of this scalar equation is numerically inexpen- 
sive. We resampled using Algorithm 2 in [l] if the effective sample size Mejj 
in (19) is less than 90% of the number of particles. 



To further investigate the performance of the filters, we run more nu- 
merical experiments and vary the availability of the data in time, as well as 
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Figure 5: Filtering results for data collected at a high spatial resolution (200 
measurement locations). The errors at T = 0.2 of the implicit particle filter 
(red), EnKF (blue) and SIR filter (black) are plotted as a function of the 
number of particles. The error bars represent the mean of the errors and 
mean of the standard deviations of the errors. 
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the number of particles. Figure [6] shows the results for the implicit particle 
filter, the simplified implicit particle filter, the EnKF and the SIR filter for 
200 measurement locations and for r = 1, 2, 4, 10. 

We observe from Figure [6j that the error statistics of the implicit parti- 
cle filter have converged, so that there is no significant improvement when 
we increase the number of particles to more than 10. In fact, the numerical 
experiments suggest that no more than 4 particles are required here. Inde- 
pendent of the gap between the observations in time, we observe an error of 
less than 1% in the observed variable. The error in the unobserved variable 
u depends strongly on the gap between observations and, for a large gap, is 
about 12%. 

The reconstructions of the observed variables by the simplified implicit 
particle filter are rather insensitive to the availability of data in time and, 
with 20 particles, the simplified filter gives an error in u of less than 1%. 
The errors in the unobserved quantity depend strongly on the gap between 
the observations and can be as large as 15%. Here, we need more particles, 
observe a larger error and a larger sensitivity of the errors to the availability 
of the data, because the simplified filter makes less direct use of the data, 
than the "full" implicit filter, since it generates the state trajectories using 
the model and only the final position of each particle is updated by the 
data. Thus, the error increases as the gap in time between the observa- 
tions becomes larger. Again, the error statistics have converged and only 
minor improvements can be expected if the number of particles is increased 
beyond 20. 

The SIR filter also makes less efficient use of the data so that we require 
significantly more particles, observe larger errors as well as a stronger depen- 
dence of the errors on the availability of data in time, for both the observed 
and unobserved quantities. With 1000 particles and for a large gap (r = 10), 
the SIR filter gives mean errors of 10% for the observed quantity and 22% 
for the unobserved quantity. An increase in the number of particles did not 
decrease these errors. The EnKF performs well and, for about 500 particles, 
gives results that are comparable to those of the implicit particle filter. The 
reason for the large number of particles is, again, the indirect use of the data 
in EnKF. 

The errors in the reconstructions of the various filters are not Gaussian, 
so that an assessment of the errors based on the first two moments is in- 
complete. In the two panels on the right of Figure [7j we show histograms 
of the errors of the implicit filter (10 particles), simplified implicit filter (20 
particles), EnKF (500 particles) and SIR filter (1000 particles) for r = 10 
model steps between observations. We observe that the errors of the im- 
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Filtering Results for High Spatial Resolution of Data: 
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Figure 6: Filtering results for data collected at a high spatial resolution (200 
measurement locations). The errors at T = 0.2 of the simplified implicit 
particle filter (upper left), implicit particle filter (upper right), SIR filter 
(lower left) and EnKF (lower right) are plotted as a function of the number 
of particles and for different gaps between observations in time. The error 
bars represent the mean of the errors and mean of the standard deviations 
of the errors. 
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Figure 7: Histogram of errors at T = 0.2 of the implicit filter, simplified 
implicit filter, EnKF and SIR filter. Left: data are available at a high 
spatial resolution (200 measurement locations) and every r = 10 model 
steps. Right: data are available at a low spatial resolution (20 measurement 
locations) and every r = 10 model steps. 
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plicit filter, simplified implicit filter and EnKF are centered to the right of 
the diagrams (at around 10% in the unobserved quantity u and about 1% 
for the observed quantity b) and show a considerably smaller spread than 
the errors of the SIR filter, which are centered at much larger errors (20% 
in the unobserved quantity u and about 9% for the observed quantity b) . A 
closer look at the distribution of the errors thus confirms our conclusions we 
drew from an analysis based on the first two moments. 

We decrease the spatial resolution of the data to 20 measurement loca- 
tions and show filtering results from 100 twin experiments in Figure [Sj The 
results are qualitatively similar to those obtained at a high spatial resolution 
of 200 data points per observation. We observe for the implicit particle filter 
that the errors in the unobserved quantity are insensitive to the spatial res- 
olution of the data, while the errors in the observed quantity are determined 
by the spatial resolution of the data and are rather insensitive to the tempo- 
ral resolution of the data. These observations are in line with those reported 
in connection with a strong 4D-Var algorithm in |17| . All other filters we 
have tried show a dependence of the errors in the observed quantity on the 
temporal resolution of the data. Again, the reason for the good performance 
of the implicit particle filter is its direct use of the data. The two panels to 
the left of Figure [Tj show histograms of the errors of the implicit filter (10 
particles), simplified implicit filter (20 particles), EnKF (500 particles) and 
SIR filter (1000 particles) for r = 10 model steps between observations. The 
results are qualitatively similar to the results we obtained at a higher spatial 
resolution of the data and the closer look at the distributions of the errors 
confirms the conclusions we drew from an analysis of the first two moments. 

In summary, we observe that the implicit particle filter yields the low- 
est errors with a small number of particles for all examples we considered, 
and performs well and reliably in this application. The SIR and simplified 
implicit particle filters could not reach the accuracy of the implicit particle 
filter, even when the number of particles is very large. The EnKF requires 
about 500 particles to come close to the accuracy of the implicit particle 
filter with only 4 particles. Although the implicit filter uses the computa- 
tionally most expensive particles, the small number of particles required for 
a very high accuracy make the implicit filter the most efficient filter for this 
problem. The partial noise works in our favor here, because the dimension of 
the space the implicit filter operates in is 20, rather than the state dimension 
600. 

Finally, we wish to compare our results with those in 17 , where a strong 
constraint 4D-Var algorithm was applied to the deterministic version of the 
test problem. Fournier used "perfect data," i.e. the observations were not 
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Figure 8: Filtering results for data collected at a low spatial resolution (20 
measurement locations). The errors at T = 0.2 of the simplified implicit 
particle filter (upper left), implicit particle filter (upper right), SIR filter 
(lower left) and EnKF (lower right) are plotted as a function of the number 
of particles and for different gaps between observations in time. The error 
bars represent the mean of the errors and mean of the standard deviations 
of the errors. 
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corrupted by noise, and applied a conjugate-gradient algorithm to minimize 
the 4D-Var cost function. The iterative minimization was stopped after 
5000 iterations. With 20 observations in space and a gap of r = 5 model 
steps between observations, an error of about 1.2% in u and 4.7% in b was 
achieved. With the implicit filter, we can get to a similar accuracy at the 
same spatial resolution of the data, but with a larger gap of r = 10 model 
steps between observations. Moreover, the data assimilation problem we 
solve here is somewhat harder than the strong constraint 4D-Var problem 
because we allow for model errors. The implicit particle filter also reduces 
the memory requirements because it operates in the 20-dimensional subspace 
of the forced variables. Each minimization is thus not as costly as a 600- 
dimensional strong constraint 4D-Var minimization. 

5 Conclusions 

We considered implicit particle filters for data assimilation. Previous im- 
plementations of the implicit particle filter rely on finding the Hessians of 
functions Fj of the state variables. Finding these Hessians can be expensive 
if the state dimension is large and can be cTimbcrsome if the second deriva- 
tives of the Fj's arc hard to calculate. We presented a new implementation 
of the implicit filter combining gradient descent minimization with random 
maps. This new implementation avoids the often costly calculation of the 
Hessians and, thus, reduces the memory requirements compared to earlier 
implementations of the filter. 

We have considered models for which the state covariance matrix is sin- 
gular or ill-conditioned. This happens often, for example, in geophysical 
applications in which the noise is smooth in space or if the model includes 
conservation laws with zero uncertainty. Previous implementations of the 
implicit filter are not applicable here and we have shown how to use our 
new implementation in this situation. The implicit filter is found to be 
more efficient than competing methods because it operates in a space whose 
dimension is given by the rank of the state covariance matrix rather than 
the model dimension. 

We applied the implicit filter in its new implementation to a test prob- 
lem in geomagnetic data assimilation. The implicit filter performed well in 
comparison to other data assimilation methods (SIR, EnKF and 4D-Var) 
and gave accurate state estimates with a small number of particles and at a 
low computational cost. We have studied how the various data assimilation 
techniques use the available data to propagate information from observed 
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to unobserved quantities and found that the impUcit particle filter uses the 
data in a direct way, propagating information to unobserved quantities faster 
than competing methods. The direct use of the data is the reason for the 
very small errors in reconstructions of the state. 
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